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PREFACE 

The  educational  experiment  reported  in  this  bul- 
letin was  initiated  by  the  former  Director  of  the  Bureau 
of  Educational  Research  and  the  data  collected  under 
his  supervision.  The  present  Director  of  the  Bureau 
is  responsible  for  the  tabulation  of  the  data  and  for 
the  preparation  of  this  report. 

This  investigation  was  made  possible  through  the 
cooperation  of  Superintendent  Peter  A.  Mortenson  and 
of  certain  principals  and  teachers  of  the  Chicago  Public 
Schools.  Not  only  did  they  cooperate  in  the  collection 
of  the  data  but  they  also  made  substantial  contribu- 
tions to  the  project  by  supplying  test  materials.  The 
writer  is  glad  to  acknowledge  the  indebtedness  of  the 
Bureau  of  Educational  Research  to  all  who  contributed 
to  this  project. 

Walter  S.  Monroe,  Director 
November   10,   1922 


Relation  of  Sectioning  a  Class  to  the 
Effectiveness  of  Instruction 

The  problem.  The  purpose  of  this  educational  experiment  was 
to  determine  the  relative  effect  upon  the  achievements  in  certain 

school  subjects  of  three  plans  of  sectioning  a  class.     A  "class"  is 

defined  as  the  total  number  of  children  assigned  to  a  teacher  for  in- 
struction even  though  they  may  be  divided  into  two  or  more  groups 
for  instructional  purposes.  The  three  plans  of  sectioning  a  class 
considered  in  this  investigation  are:  (1)  teaching  a  class  as  a  single 
unit;  (2)  dividing  the  class  into  two  equal  groups  approximately 
equivalent  with  respect  to  general  intelligence;  (3)  dividing  the  class 
into  three  equal  groups  approximately  equivalent  with  respect  to 
general  intelligence.  When  a  class  is  taught  as  one  group,  all  of  the 
pupils  recite  at  the  same  time.  Following  the  recitation  there  is  a 
period  for  study.  Thus  under  this  plan  the  work  of  the  teacher  al- 
ternates between  "hearing  classes"  and  supervising  the  study  of 
the  pupils.  When  a  class  is  taught  as  two  sections,  one  group 
recites  while  the  other  group  studies.  In  this  case  the  teacher's 
time  is  almost  wholly  devoted  to  "hearing  classes."  Any  supervi- 
sion of  the  study  of  the  pupils  is  of  necessity  given  incidentally  and  at 
irregular  intervals  when  the  teacher  is  fortunate  enough  to  have  a 
few  minutes  of  leisure  during  a  recitation  period.  When  a  class  is 
divided  into  three  sections,  the  conditions  are  much  the  same  except 
that  necessarily  the  length  of  the  recitation  periods  is  reduced.  In 
general  pupils  of  one  section  study  during  the  recitation  periods 
of  the  other  two  sections. 

The  specific  problem  of  this  investigation  was  to  determine 
the  relative  effect  of  these  three  plans  of  sectioning  a  class  upon  the 
direct  results  of  instruction  in  certain  school  subjects.  In  other 
words  this  investigation  sought  to  answer  the  question,  "Which  is 
the  best  plan  of  sectioning  a  class?" 

General  plan  of  the  experiment.  If  it  were  possible  to  secure 
three  groups  of  classes  so  that  all  factors  which  affect  the  results  of 
instruction  were  equivalent  in  the  beginning  of  the  experiment  and 
could  be  controlled  throughout  the  experimental  period,  the  simplest 

procedure  would  be  to  have  one  group_-of  classes  taught  as  a  unit, 

another  group  taught  in  two  sections  and  a  third  group  in  three  sec- 


tions.  However,  it  would  be  difficult,  if  not  impossible,  to  secure 
exact  equivalence  of  teaching  ability  and  of  pupil  material.  Our 
facilities  for  measuring  the  ability  of  teachers  are  extremely  crude 
and  at  best  it  would  be  difficult  to  demonstrate  that  any  differences 
found  in  the  results  of  instruction  were  not  produced  largely  by 
differences  in  teaching  ability.  It  is  true  that  we  have  a  number  of 
general  intelligence  tests  which  might  be  used  to  measure  the  quality 
of  the  pupil  material.  However,  the  limitations  of  these  instruments 
are  such  that  one  would  be  unable  to  interpret  small  differences  in 
the  resulting  achievements. 

In  order  to  avoid  these  two  difficulties  this  experiment  was 
planned  so  that  the  same  teacher  should  instruct  a  given  class  when 
organized  according  to  two  different  plans  of  sectioning.  This, 
necessarily,  must  be  done  during  successive  semesters.  This  proce- 
dure insured  the  constancy  of  the  teacher,  although  not  necessarily 
of  teaching  ability  since  the  ability  of  a  given  teacher  may  vary 
from  semester  to  semester  with  different  types  of  class  organization. 
In  order  that  the  pupil  material  might  be  the  same  for  the  two  plans 
of  class  organization  one  hundred  percent  promotion  was  secured 
at  the  middle  of  the  school  year.  Thus,  a  teacher  who  instructed 
a  class  as  one  section  during  the  first  semester  of  this  experiment 
instructed  the  same  pupils  during  the  second  semester  but  with  the 
class  divided  into  two  or  three  sections.  Other  teachers  taught 
classes  organized  according  to  other  combinations  of  sectioning. 

This  general  plan  of  the  experiment  makes  the  semester  a  vari- 
able factor.  It  is  possible  that  pupils  may  normally  make  greater 
progress  during  one  semester  than  during  the  other.  Furthermore, 
the  gain  of  second  trial  scores  over  first  trial  scores  is  likely  to  be 
much  greater  than  the  gain  of  third  trial  scores  over  second  trial 
scores  simply  because  the  pupils  become  acquainted  with  the  testing 
procedure.  In  order  to  balance  these  two  variable  factors  it  was 
necessary  to  arrange  experimental  groups  in  pairs.  Thus,  corres- 
ponding to  an  experimental  group  of  classes  which  was  taught  as  a 
single  section  during  the  first  semester  and  as  three  sections  during 
the  second  semester,  there  was  another  group  of  classes  taught  as 
three  sections  during  the  first  semester  and  as  a  single  section  during 
the  second  semester.  In  dividing  a  class  into  sections  the  scores 
yielded  by  the  general  intelligence  tests  were  used  to  secure  sec- 
tions of  approximately  equivalent  pupil  material.  Six  experimental 
groups  of  classes  were  organized  as  follows: 


$£' 


Group  I.  Classes  taught  as  a  single  section  during  the  first 
semester  and  as  three  sections  during  the  second  semester. 

Group  II.  Classes  taught  as  three  sections  during  the  first 
semester  and  as  one  section  during  the  second  semester. 

Group  III.  Classes  taught  as  one  section  during  the^first 
semester  and  as  two  sections  during  the  second  semester. 

Group  IV.  Classes  taught  as  two  sections  during  the  first 
semester  and  as  one  section  during  the  second  semester. 

Group  V.  Classes  taught  as  two  sections  during  the  first 
semester  and  as  three  sections  during  the  second  semester. 

Group  VI.  Classes  taught  as  three  sections  during  the  first 
semester  and  as  two  sections  during  the  second  semester. 

So  far  as  the  writer  knows,  essentially  the  same  methods  of 
instruction  and  subject-matter  were  followed  in  all  of  these  groups. 
The  investigation  was  confined  to  Grades  II,  V,  and  VII  in  order  to 
reduce  the  labor  and  expense.  As  these  grades  are  fairly  representa- 
tive of  the  three  divisions  of  the  elementary  school,  primary,  inter- 
mediate and  grammar,  it  is  not  likely  that  different  results  would  be 
obtained  in  the  other  grades.  The  number  of  classes,  the  total  en- 
rollment, and  the  number  of  complete  records  in  each  experimental 
group  are  given  in  Table  I. 

TABLE  I.    NUMBER  OF  CLASSES,  TOTAL  ENROLLMENT,  AND  NUMBER 
OF  COMPLETE  RECORDS  IN  EACH  OF  THE  EXPERIMENTAL  GROUPS 


Grade 

I 

II 

Group 
III        IV 

V 

VI 

Total 

II 

Number  of  classes 

7 

4 

3 

6 

7 

3 

30 

Total  enrollment 

348 

201 

138 

288 

324 

162 

1461 

Complete  records 

240 

111 

103 

208 

224 

89 

975 

V 

Number  of  classes 

2 

2 

8 

4 

4 

4 

24 

Total  enrollment 

87 

92 

379 

192 

196 

181 

1127 

Complete  records 

70 

72 

326 

133 

157 

143 

901 

VII 

Number  of  classes 

3 

3 

5 

5 

2 

18 

Total  enrollment 

141 

140 

244 

214 

91 

830 

Complete  records 

119 

109 

186 

159 

86 

659 

The  data  collected.  Through  the  cooperation  of  Superintendent 
Peter  A.  Mortenson  of  the  Chicago  Public  Schools  and  of  certain 
principals  and  teachers,  the  Bureau  of  Educational  Research  carried 
on  this  investigation  during  the  school  year  of  1920-21.     Experi- 


mental  classes  were  organized  in  sixteen  elementary  schools.1  For 
measuring  the  general  intelligence  of  the  pupils  the  Pressey 
Primer  Scale  was  used  in  the  second  grade,  and  the  Illinois  General 
Intelligence  Scale  in  the  other  two  grades.  The  achievements  of  the 
pupils  in  the  second  grade  were  measured  by  means  of  the  Pressey 
Scale  of  Attainment  No.  1.  In  the  fifth  and  seventh  grades  achieve- 
ments were  measured  by  Monroe's  Standardized  Silent  Reading 
Tests,  Revised,  Monroe's  General  Survey  Scale  in  Arithmetic,  and 
Buckingham's  Problem  Scale  in  Arithmetic,  Divisions  1  and  2.  The 
general  intelligence  tests  were  given  only  at  the  beginning  of  the  ex- 
periment, October  11,  1920.  Form  1  of  the  achievement  tests  was 
given  at  this  time.  Form  2  of  the  achievement  tests  was  adminis- 
tered at  the  close  of  the  first  semester,  February  3,  1921.  At  the 
close  of  the  experimental  period,  May  11,  1921,  Form  1  was  again 
given. 

The  tests  were  administered  by  the  teachers  who  also  scored  the 
test  papers  and  entered  the  scores  upon  individual  record  cards. 
This,  however,  was  done  only  after  all  of  the  teachers  involved  in  the 
experiment  had  been  called  together  for  the  purpose  of  acquainting 
them  with  the  tests.  In  this  explanation  several  tests  were  adminis- 
tered to  the  teachers  in  exactly  the  same  way  as  they  were  to  be  ad- 
ministered to  the  pupils.  In  addition  detailed  instructions  were 
supplied  to  the  teachers  for  all  steps  of  the  work.  Since  no  compari- 
sons were  made  between  the  scores  yielded  by  tests  administered  by 
different  teachers  it  is  felt  that  this  procedure  in  the  administration 
of  the  tests  does  not  seriously  affect  the  results  of  the  experiment. 

Limitations  of  the  experiment  to  be  kept  in  mind  in  interpreting 
the  results.  A  number  of  conditions  must  be  kept  in  mind  in  inter- 
preting the  results.  In  the  first  place  practically  all  of  the  teachers 
who  cooperated  in  the  investigation  had  been  accustomed  to  teaching 
classes  in  two  sections.  A  few,  perhaps  1  in  20,  had  taught  a  class  as 
a  single  section  but,  so  far  as  the  writer  was  informed,  no  teacher 
had  had  any  experience  in  instructing  a  class  in  three  sections.  Thus, 
it  is  altogether  likely  that  most  of  the  teachers  had  acquired  a  techni- 
que of  instruction  which  would  prove  more  successful  with  a  class 
divided  into  two  sections  than  with  a  class  divided  into  either  one 
or  three  sections.     Furthermore,  there  appears  to  be  a  prejudice 


^hese  sixteen  schools  were  the  following:  Brown,  Dante,  Douglas,  Fiske,  Jenner, 
Julia  Ward  Howe,  Morse,  Otis,  Pullman,  Scanlan,  Shields,  Spry,  Van  Vlissingen,  Ward, 
Wentworth,  and  West  Pullman. 

8 


against  the  division  of  a  class  into  three  sections.  Thus,  there  is 
introduced  a  factor  which  may  be  expected  to  produce  greater  achieve- 
ments in  classes  taught  as  two  sections  than  in  classes  taught  as 
either  one  or  three  sections.  The  effect  of  this  factor  is,  however, 
unknown  but  it  should  by  all  means  be  recognized  in  interpreting 
the  results. 

The  instruments  used  for  measuring  the  achievements  of  the 
pupils  do  not  measure  all  achievements  resulting  from  instruction. 
They  can  be  considered  to  do  no  more  than  measure  representative 
samples  of  the  achievements  within  their  respective  fields.  Outside 
of  silent  reading  and  arithmetic,  in  which  tests  were  given,  there  are 
many  important  achievements  of  which  no  attempt  was  made  to 
secure  direct  measurements.  It  is,  of  course,  possible  that  the 
measures  of  achievements  secured  correlate  closely  enough  with  all 
other  achievements  resulting  from  instruction,  that  a  sufficiently 
accurate  index  of  all  achievements  is  furnished  for  judging  the  re- 
lative effectiveness  of  the  instruction  in  the  different  experimental 
groups.  However,  convincing  experimental  evidence  on  the  point 
is  wanting  and,  for  this  reason,  due  caution  must  be  exercised  in 
extending  the  conclusions  of  this  experiment  to  school  subjects  other 
than  silent  reading  and  arithmetic,  as  well  as  to  the  more  subtle 
outcomes  engendered  by  the  social  contacts  of  the  school  room. 

Finally,  it  must  be  remembered  that  this  investigation  was 
carried  on  in  classes  enrolling  approximately  45  pupils.  Hence 
it  does  not  necessarily  follow  that  the  conclusions  would  apply  to 
classes  enrolling  20  to  30  pupils.  It  is  possible  that  this  change  in 
the  size  of  class  might  produce  a  complete  reversal  in  the  conclusions. 

Method  of  summarizing  data.  After  rejecting  records  which 
were  incomplete  and  obviously  inaccurate,  the  scores  yielded  by  an 
application  of  a  test  were  combined  in  a  total  distribution  for  each 
experimental  group.  Thus,  a  distribution  was  formed  of  the  first 
trial  scores  made  on  Monroe's  Standardized  Silent  Reading  Tests, 
Revised,  by  the  group  of  fifth  grade  pupils  enrolled  in  "classes  taught 
as  a  single  section  during  the  first  semester  and  as  three  sections  dur- 
ing the  second  semester."  In  the  same  way  distributions  of  scores 
were  formed  for  each  of  the  experimental  groups  and  for  each  appli- 
cation of  the  test.  The  gain  in  achievement  during  the  first  semester 
was  found  by  subtracting  the  average  score  for  the  first  trial  of  a 
test  from  the  average  score  of  the  second  trial.  The  gain  for  the 
second  semester  was  found  by  subtracting  the  average  score  of  the 


second  trial  from  that  of  the  third  trial.  A  second  measure  of  gain 
was  secured  by  following  a  similar  procedure  with  the  median  scores 
but  these  gains  are  not  given  in  this  report  as  they  were,  in  general, 
in  agreement  with  those  calculated  from  the  average  scores. 

In  calculating  these  gains  no  account  was  taken  of  the  possible 
non-equivalence  of  the  different  forms  of  the  tests  used.  In  fact  no 
accurate  information  concerning  the  equivalence  of  duplicate  forms 
is  available  except  for  Monroe's  Standardized  Silent  Reading  Tests, 
Revised,  and  for  Monroe's  General  Survey  Scale  in  Arithmetic. 
The  duplicate  forms  of  these  two  tests  have  been  shown  to  be  approx- 
imately equivalent.2  However,  since  Form  1  of  each  test  was  used 
twice  and  the  average  scores  calculated  from  it  were  used  both  as 
subtrahends  and  minuends,  and  since  the  gain  for  any  plan  of  section- 
ing is  computed  from  both  semesters  the  non-equivalence  of  Forms  1 
and  2  of  the  tests  used  will  not  affect  the  comparisons  of  gains  made 
in  the  following  tables. 

The  point  scores  yielded  by  the  different  tests  are  expressed  in 
terms  of  different  units  and  from  different  zero  points.  Thus  before 
any  combination  from  the  results  of  the  different  tests  can  be  made 
it  is  necessary  to  express  the  gains  in  terms  of  a  common  unit.  The 
usual  assumption  in  such  cases  is  that  the  standard  deviation  of  the 
distribution  of  scores  represents  the  same  increment  of  ability  for 
one  test  as  for  another.  On  the  basis  of  this  assumption  a  total  dis- 
tribution for  each  test  was  secured  by  adding  the  distributions  of 
the  six  experimental  groups  within  a  grade.  This  was  done  for  the 
scores  secured  at  each  period  of  testing.  The  average  of  the  three 
standard  deviations  was  assumed  to  represent  the  same  increment  of 
ability  for  each  test  and  was  used  as  a  divisor  for  reducing  the  gains 
to  the  basis  of  a  common  unit.  For  example,  during  the  first  semester 
the  fifth  grade  pupils  in  Group  I  classes  made  a  gain  in  arithmetic  of 
23.82  points.  During  the  second  semester  they  made  a  gain  of  21.5 
points.  The  average  standard  deviation  of  the  arithmetic  scores 
in  the  fifth  grade  is  19.65.  Using  this  as  a  divisor  we  secure  as 
quotients  1.21  and  1.09.  In  this  manner  the  entries  in  Tables  II, 
III  and  IV  were  obtained.  The  two  quotients  whose  calculation  was 
explained  are  given  in  Table  III. 

Tables  II,  III  and  IV  are  similar  in  structure  and  are  to  be  read 
in  the  same  way.     The  gains  for  the  different  experimental  groups 

2Monroe,  W.  S.  Illinois  Examination,  University  of  Illinois  Bulletin  Vol.  19,  No. 
9,  Bureau  of  Educational  Research  Bulletin  No.  6.  Urbana:  University  of  Illinois, 
1921.     70  p. 
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are  arranged  in  pairs.  In  Table  II,  the  gain  for  Group  I  on  Test  1 
when  taught  in  classes  of  one  section  is  1.42.  When  taught  in  three 
sections  the  gain  is  .55.  The  gain  for  Group  II  classes  when  taught 
in  one  section  is  .90  and  when  taught  in  three  sections  it  is  1.11. 
The  Group  I  classes  were  taught  in  one  section  during  the  first 
semester  but  the  Group  II  classes  were  taught  in  one  section  during 
the  second  semester.  This  difference  in  time  is  largely  responsible 
for  the  differences  in  the  size  of  the  gains. 

Interpretation  of  results.     In  interpreting  the  gains  in  Tables 

II,  III  and  IV  it  is  necessary  to  keep  in  mind  both  the  constant  and 
variable  errors  of  measurement  which  are  involved  in  the  original 
data  as  well  as  the  chance  variations  in  the  gains  due  to  sampling. 
The  variable  errors  of  measurement  in  the  original  data  depend  upon 
the  reliability  of  the  tests  used.  If  we  assume  a  coefficient  of  re- 
liability3 of  .84  for  Test  1,  it  can  be  shown  that  the  probable  variable 
error  of  measurement  is  approximately  .25  when  expressed  in  terms 
of  sigma  which  is  the  unit  used  in  expressing  the  gains  in  Tables  II, 

III,  and  IV.4  A  probable  error  of  measurement  of  .25  means  that 
the  scores  for  50  percent  of  the  pupils  involve  variable  errors  which 
are  less  than  .25.  For  the  other  50  percent  the  variable  errors  will 
be  greater  than  .25.  The  presence  of  variable  errors  of  measurement 
affects  the  average  of  the  scores  as  shown  by  the  following  formula 
in  which  N  is  the  number  of  scores  upon  which  the  average  is  based. 

P.E.m 


P.E. 


M  average    — 


Substituting  in  this  formula  for  Group  I,  we  find  the  probable  error 
of  measurement  of  the  average  (P.  E.  m  average)  is  .017;  for  Group  II 
it  is  .024.     The  gain  1.42  is  the  difference  between  the  two  averages. 


sThe  coefficient  of  reliability  assumed  here  is  probably  higher  than  would  be  found 
for  this  test.  When  based  upon  the  scores  of  a  single  grade,  the  coefficient  of  re- 
liability for  Monroe's  General  Survey  Scale  in  Arithmetic  is  approximately  .85.  For 
Monroe's  Standardized  Silent  Reading  Test  1,  Revised,  the  coefficients  of  reliability 
are  approximately  .75  for  rate  and  .65  for  comprehension.  For  Test  II  they  are 
about  .08  higher.     The  reliability  of  the  other  tests  is  not  known. 


4The  formula  for  the  probable  variable  error  of  measurement  is 
In  this  case  (7  =  1. 


P.E.M  =  .6745  (7 j/i. 
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The  probable  error  of  the  difference  of  the  two  averages  is  given  by 
the  following  formula 


v 


P.E.Dif.=\p.E.i  +  P.E.2 

In  this  formula  P.  E.i  and  P.  E.2  stand  for  the  probable  errors  of 
measurement  of  the  two  averages  whose  difference  is  taken.  In  this 
case  P.  E.i  is  equal  to  P.  E.2  since  we  have  used  the  average  of  the 
standard  deviations  of  the  several  distributions  in  reducing  the  gains 
to  a  comparable  basis.  Applying  the  above  formula,  we  find  that 
the  probable  variable  error  of  measurement  to  be  associated  with  1.42 
is  .024  and  with  .90  is  .034.  The  formula  for  the  probable  error  of 
the  sum  of  the  two  averages  is  the  same  as  that  for  their  difference. 
Hence  we  may  calculate  the  probable  error  of  measurement  to  be 
associated  with  the  average  gain  1.16  by  taking  one  half  of  the 
probable  error  of  measurement  of  the  sum  of  the  two  averages.  The 
P.  E.m  of  the  average  gain  1.16  is  .020. 

Since  the  probable  variable  error  of  measurement  depends  only 
upon  the  magnitude  of  the  standard  deviation  of  the  scores  and  the 
number  of  scores,  we  will  obtain  the  same  result  for  the  gains  of  these 
two  groups  when  taught  in  classes  of  three  sections.  The  probable 
variable  error  of  measurement  of  the  difference  (.33)  may  be  calcu- 
lated by  the  formula  given  above.     It  is  .028. 

This  probable  variable  error  of  measurement  is  relatively  small 
in  comparison  with  the  gain  .33,  and  in  general  when  an  average  or 
difference  is  three  or  four  times  its  probable  error  it  can  be  considered 
significant.  Hence,  if  we  had  to  consider  only  the  variable  errors  of 
measurement  we  would  be  justified  in  asserting  that  this  difference 
was  significant  and  could  not  be  due  to  the  presence  of  these  errors 
in  our  original  data.  However,  it  should  be  remembered  that  we 
have  been  liberal  in  the  estimate  of  the  coefficient  of  reliability. 
It  is  likely  that  the  true  value  of  the  probable  error  is  much 
larger. 

Since  all  gains  are  expressed  in  terms  of  a  common  unit  the  prob- 
able variable  errors  of  measurement  found  for  the  entries  under  Test  1 
will  apply  also  to  Tests  2,  3,  and  4  provided  we  assume  the  same  co- 
efficient of  reliability  for  these  tests.  The  probable  variable  error  of 
measurement  of  the  average  is  affected  by  the  number  of  cases  from 
which  the  average  is  computed.  Hence  for  the  gains  made  by  other 
groups  it  will  be  slightly  greater,  since  the  number  of  scores  is  smaller 


15 


for  those  groups.  In  Table  III  the  number  of  scores  in  Groups  III 
and  IV  is  slightly  larger.  Hence  a  smaller  probable  variable  error 
of  measurement  will  be  found,  but  for  all  of  the  other  groups  it  will 
be  larger  than  the  one  which  we  have  considered  in  detail.  In  several 
cases  the  difference  in  gains  is  so  small  that  when  compared  with  the 
probable  variable  error  of  measurement  it  cannot  be  considered  as 
significant. 

In  addition  to  the  variable  errors  of  measurement,  it  is  necessary 
to  consider  the  chance  variations  in  the  gains  due  to  sampling  even 
when  the  sample  has  been  chosen  without  bias.  The  probable  error 
of  an  average  due  to  sampling  is  given  by  the  following  formula 

0"dist. 

P.  E.s  =  .6745 


Since  sigma  (a)  has  been  used  as  a  unit  in  terms  of  which  the  gains 
are  expressed,  Cdist.  equals  1  for  our  calculations.5  In  the  case 
of  Group  I,  P.  E.s=.044.  The  gain  1.42  is  the  difference  between 
two  averages  and  hence  it  would  be  necessary  to  apply  the  formula 
for  the  probable  error  of  the  difference  of  the  two  averages.  This 
being  done  we  find  that  the  P.  E.s  to  be  applied  to  the  gain 
(1.42)  is  .062.  In  case  of  Group  II,  P.  E.s=  .064  and  for  the  differ- 
ence between  the  two  averages  it  is  .090.  For  the  average  1.16, 
P.  E.s  =  .055.     For  the  difference  .33,  P.  E.s  =  .078. 

When  we  consider  the  probable  error  due  to  sampling  (.078) 
in  addition  to  the  probable  variable  error  of  measurement  (.028)  the 
difference  (.33)  would  probably  be  significant  and  indicate  a  slight 
superiority  in  achievement  as  measured  by  Test  1  for  the  pupils 
taught  in  classes  of  one  section,  provided  no  other  errors  could  be 
considered  to  affect  this  difference.  It  is,  however,  necessary  to 
consider  the  constant  errors  of  measurement.  Their  exact  magni- 
tude can  not  be  known  but  their  presence  is  evident.  For  example, 
in  Table  II  the  gains  on  Test  1  for  Groups  I  and  II  when  taught  as 
one  section  are  1.42  and  .90  respectively.  The  gain  of  1.42  was  made 
during  the  first  semester  and  is  the  difference  between  the  first  and 
second  trial  scores.  The  gain  of  .90  was  made  during  the  second 
semester  and  is  the  difference  between  the  second  and  third  trial 
scores.     Due  to  the  pupils  becoming  acquainted  with  the  tests  and 

5Thls  is  not  the  true  value  of  a.     The  variable  errors  of  measurement  tend  to  in- 
crease the  value  of  the  obtained  sigma.     The  relation  is  given  by  the  formula 

^true  =  0" obtained    VTn 
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the  testing  procedure,  both  of  these  gains  involve  a  constant  error. 
This  tends  to  make  the  obtained  gain  larger  than  the  true  gain,  but 
as  the  practice  effect  of  the  second  trial  scores  over  the  first  trial 
scores  is  larger  than  that  of  the  third  trial  over  the  second  trial  scores, 
it  is  reasonably  certain  that  the  gain  for  Group  I  (1.42)  contains  the 
larger  constant  error.  The  gains  made  by  these  two  groups  when 
taught  in  classes  of  three  sections  are  .55  and  1.11.  Both  of  these 
gains  involve  a  constant  error  but  in  this  case  the  larger  constant 
error  is  found  in  the  gain  for  Group  II.  Each  of  the  average  gains 
for  these  two  groups  (1.16  and  .83)  includes  a  relatively  large  constant 
error  but  the  two  errors  are  much  more  nearly  equal  than  those  in- 
cluded in  the  gains  for  each  group  separately.  Hence,  we  are  probably 
justified  in  considering  their  difference  (.33)  to  be  relatively  un- 
affected by  the  presence  of  constant  errors  in  any  of  our  original  data. 

However,  the  neutralization  of  the  constant  errors  which  seems 
plausible,  if  not  probable,  in  the  case  we  have  just  considered  does 
not  appear  to  have  taken  place  in  a  number  of  the  other  differences 
in  this  group  of  tables.  With  the  exception  of  Groups  I  and  II  in 
Table  II  some  of  the  differences  are  positive  but  others  are  negative 
for  each  pair  of  groups,  although  it  is  not  impossible  that  a  given  plan 
of  sectioning  a  class  might  be  more  effective  in  one  subject  than  in 
another.  The  variations  in  the  signs  of  the  differences  do  not  appear 
to  occur  in  such  a  way  as  to  justify  this  explanation  of  the  negative 
gains.  It  is  likely  that  a  constant  error  was  introduced  in  certain 
groups  of  scores  which  was  not  neutralized  in  the  difference.  For 
example,  Group  VI  is  shown  by  Test  2  to  have  made  a  larger  gain 
during  the  second  semester  when  taught  in  two  sections.  Each  of 
the  other  tests  shows  a  smaller  gain  for  this  semester  and  this  we 
should  expect  as  the  gain  is  the  difference  between  the  second  and 
third  trial  scores.  The  probable  explanation  of  this  condition  is  that 
in  some  way  a  constant  error  was  introduced  in  one  set  of  scores  yield- 
ed by  Test  2  for  Group  VI.  An  examination  of  Tables  III  and  IV 
reveals  several  similar  instances.  Hence,  we  are  forced  to  the  con- 
clusion that  at  least  certain  sets  of  scores  involve  an  unknown  con- 
stant error.  The  fact  that  this  happened  in  certain  cases  tends 
to  make  one  suspicious  of  the  presence  of  an  unknown  constant  error 
in  other  sets  of  scores  even  though  evidence  of  its  presence  is  lacking. 

It  is  perhaps  significant  that  in  the  case  of  the  differences  in 
gains  between  classes  taught  as  one  section  and  classes  taught  in 
three  sections,  eight  gains  are  positive  while  six  are  negative.     The 
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same  situation  prevails  with  respect  to  the  gains  made  by  classes 
taught  in  one  section  when  compared  with  the  gains  made  by  classes 
taught  in  two  sections.  For  classes  taught  in  two  sections  compared 
with  classes  taught  in  three  sections,  we  have  records  only  in  the 
second  and  fifth  grades.  Four  of  the  differences  are  positive  while 
five  are  negative. 

Conclusion.  The  facts  presented  in  Tables  II,  III,  and  IV  and 
the  errors  they  include  appear  to  justify  the  conclusion  that  there  is 
no  evidence  of  greater  achievements  being  made  by  pupils  when 
taught  in  classes  organized  on  the  basis  of  one  plan  of  sectioning  than 
in  classes  organized  on  a  different  plan  of  sectioning.  Since  the  teach- 
ers were  more  experienced  in  teaching  classes  in  two  sections  and 
probably  preferred  this  plan  of  organization  this  condition  might 
appear  to  mean  that  the  division  of  classes  into  two  sections  was  the 
least  efficient  of  the  three  plans.  However,  in  the  writer's  judgment 
this  conclusion  is  not  justified.  The  most  obvious  inference,  in  his 
opinion,  to  be  drawn  from  the  data  of  this  experiment  is  that  the 
educational  tests  used  do  not  yield  sufficiently  accurate  and  precise 
measures  of  achievement  to  make  possible  the  determination,  under 
the  conditions  of  this  experiment,  of  the  best  method  of  sectioning 
a  class.  It  is  likely  that  the  differences  in  the  gains  made  during  a 
period  of  less  than  a  semester  are  not  large.  This  being  the  case  it  is 
necessary  either  to  extend  the  experimental  period  or  to  secure  more 
precise  measures  of  achievement.  The  magnitude  of  the  probable 
variable  error  of  measurement  of  the  difference  and  also  of  the  prob- 
able error  due  to  sampling  can  be  decreased  by  increasing  the  number 
of  pupils  in  the  experimental  groups,  but  the  constant  errors  are  not 
affected  by  any  increase  in  the  number  of  cases.  Certain  constant 
errors  are  neutralized  in  the  differences  but,  as  we  have  shown,  other 
constant  errors  which  occur  in  only  certain  sets  of  scores  were  not 
eliminated.  The  presence  of  these  constant  errors  is  due  to  imper- 
fections in  the  educational  tests  used.  Therefore,  it  appears  that 
until  our  instruments  for  measuring  achievements  of  school  children 
are  materially  improved  we  cannot  expect  such  educational  ex- 
periments as  the  one  described  in  this  report  to  lead  to  reliable 
conclusions. 
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